Exploring EVENTS

Screen%20Shot%202022-01-30%20at%2011.11.04.png

Experiments

    1. Visualising Events Dataframe
    1. Exploring Tags Events
    1. Calculating Events Description Similarity
    1. Calculating Events Description Topic Modelling
    1. Exploring the Schedules of Events
      • 5.1 Getting the Frequency of Starting Dates of Events Schedules
      • 5.2 Getting the Frequency of End Dates of Events Schedules
    1. Exploring the Performances Tickets of Events Schedules
      • 6.1 Getting the Frequency of Price Tickets
      • 6.2 Getting the frequency of type (Standard, Children) tickets
      • 6.3 Exploring Performances Places - ATENTION: Merging information with "places" dataframe!
        • 6.3.1 Frequency of Performances per town
        • 6.3.2 Frequency of Type tickets per town
        • 6.3.3 Frequency of Price tickets type per town
        • 6.3.4 Frequency of Max_Price tickets per town
          • 6.3.4.1 Frequency of Free tickets per town
          • 6.3.4.2 Frequency of No Free tickets per town
      • 6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews
        • 6.4.1 Frequency of Price Tickets per Scottish City
        • 6.4.2 Frequency of Type Tickets per Scottish City
        • 6.4.3 Frequency of Schedules Dates per Event and per Scottish City
        • 6.4.4.Grouping Schedules per Event and Scottish City
        • 6.4.5 Exploring Tags per Schedule and Scottish Cities
          • 6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh
          • 6.4.5.2 Exploring the Frequency of schedules tags for Glasgow
        • 6.4.6 Histograms of starting/end schedules dates for Edinburgh
        • 6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time
          • 6.4.7.1 Frequency of schedules Starting Date in Scottish City
          • 6.4.7.2 Frequency of schedules Ending Date in Scottish City
          • 6.4.7.3 Scheduled tags and Starting Dates in Scottish City
          • 6.4.7.4 Scheduled tags and Starting Dates in Scottish City

0. Importing libraries and loading the json file with 5000 events to a dataframe

In [135]:
import json
import pandas as pd
import plotly.express as px
import os
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
import plotly.graph_objects as go
import numpy as np
from gensim.parsing.preprocessing import remove_stopwords
import re
In [2]:
with open('dataset/sample_20190501.json', 'r') as f:
    data = json.load(f)
    print(len(data["events"]))
    events=data["events"]
df = pd.DataFrame(events)
11030

1. Visualizing the events dataframe

In [3]:
df
Out[3]:
event_id modified_ts created_ts name sort_name status id schedules descriptions tags category properties ranking_level ranking_in_level website phone_numbers alternative_names
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Days out, Glasgow City of Science, Sc... Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
1 345866 2019-06-09T12:51:49Z 2013-03-01T10:33:08Z Red Raw Red Raw live 345866 [{'start_ts': '2019-05-06T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Red Raw, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
2 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'start_ts': '2019-05-04T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, The Saturday Show] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
3 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'start_ts': '2019-05-19T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, Sunday Night Laugh-In] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
4 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'start_ts': '2019-08-02T17:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 1 http://thepublandlord.com/ NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11025 1417800 2019-10-22T10:15:20Z 2019-09-30T12:26:15Z Love Letters Love Letters live 1417800 [{'start_ts': '2019-10-26T19:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Play, Theatre] Theatre {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
11026 1419477 2019-10-31T01:38:47Z 2019-10-03T01:39:29Z Experiment 4042 - Powered by Milk Events Experiment 4042 - Powered by Milk Events live 1419477 [{'start_ts': '2019-10-31T22:30:00+00:00', 'en... [{'type': 'description.official', 'description... [Clubs, Hip Hop, R&B] Clubs {'list:importance': 'l'} 3 2 NaN NaN NaN
11027 1426790 2019-10-11T12:35:55Z 2019-10-10T17:10:20Z Walking Tour Walking Tour live 1426790 [{'start_ts': '2019-10-15T10:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Days out, Nature, Walks] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
11028 1427893 2019-10-17T12:34:08Z 2019-10-11T12:35:11Z Philly's Pub Quiz Philly's Pub Quiz live 1427893 [{'start_ts': '2019-10-24T18:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Activities, Days out, Pub Quiz] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'start_ts': '2019-10-25T12:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Activities, Days out, Golf, Sport] Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN

11030 rows × 17 columns

In [4]:
## selecting some columns

Experiment 2: Exploring Tags Events

We are going to separete the elements stored in each tag list into new rows.

In [5]:
df["tags"][0:5]
Out[5]:
0    [Comedy, Days out, Glasgow City of Science, Sc...
1                          [Comedy, Red Raw, Stand-up]
2                [Comedy, Stand-up, The Saturday Show]
3            [Comedy, Stand-up, Sunday Night Laugh-In]
4                                   [Comedy, Stand-up]
Name: tags, dtype: object
In [6]:
df_tags=df.explode('tags')
In [7]:
df_tags
Out[7]:
event_id modified_ts created_ts name sort_name status id schedules descriptions tags category properties ranking_level ranking_in_level website phone_numbers alternative_names
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Comedy Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Days out Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Glasgow City of Science Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Science Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2019-05-28T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Stand-up Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11028 1427893 2019-10-17T12:34:08Z 2019-10-11T12:35:11Z Philly's Pub Quiz Philly's Pub Quiz live 1427893 [{'start_ts': '2019-10-24T18:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Pub Quiz Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'start_ts': '2019-10-25T12:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Activities Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'start_ts': '2019-10-25T12:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Days out Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'start_ts': '2019-10-25T12:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Golf Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'start_ts': '2019-10-25T12:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Sport Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN

25416 rows × 17 columns

In [8]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags
Out[8]:
tags number_of_times
463 Music 2492
147 Comedy 2432
727 Theatre 1998
86 Books 1413
191 Days out 1107
... ... ...
373 Inspiration 1
372 Innerleithen Music Festival 1
371 Industrial rock 1
369 Indiana Jones 1
414 Lloyd Cole 1

828 rows × 2 columns

In [9]:
fig = px.line(g_tags, x="tags", y="number_of_times", title='Number of times that each tag appears')
fig.show()

Experiment 3: Description Similarity

Exploding the column description

Given a description cell, with a list of descriptions, we will create new row per element in that list.

In [10]:
df["descriptions"][0:5]
Out[10]:
0    [{'type': 'description.list.default', 'descrip...
1    [{'type': 'description.list.default', 'descrip...
2    [{'type': 'description.list.default', 'descrip...
3    [{'type': 'description.list.default', 'descrip...
4    [{'type': 'description.list.default', 'descrip...
Name: descriptions, dtype: object
In [11]:
df_descriptions=df.explode('descriptions')
In [12]:
df_d=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
In [13]:
df_desc=df_d[["event_id", "description"]]
In [14]:
df_desc
Out[14]:
event_id description
0 232545 Hardworking staff of the city's universities a...
0 232545 Bright Club's unique blend of comedy and acade...
1 345866 The Stand's spankingly good new talent night, ...
1 345866 Our long-running weekly beginner's showcase is...
2 347164 Saturday nights à la Stand are normally a sold...
... ... ...
11026 1419477 Halloween at 4042 Grindlay Street. Thursday, O...
11027 1426790 A friendly and welcoming walking group that me...
11028 1427893 Weekly quiz night with free entry.
11029 1436517 Play at this construction site themed indoor g...
11029 1436517 Scrapheap Golf is here to bulldoze your boredo...

17181 rows × 2 columns

Finding similar descriptions events - Deep Learning - Transformers

In [15]:
# remving the rows which description is empty
df_desc1=df_desc.dropna(subset=['description']).reset_index()
In [16]:
df_desc1[0:5]
Out[16]:
index event_id description
0 0 232545 Hardworking staff of the city's universities a...
1 0 232545 Bright Club's unique blend of comedy and acade...
2 1 345866 The Stand's spankingly good new talent night, ...
3 1 345866 Our long-running weekly beginner's showcase is...
4 2 347164 Saturday nights à la Stand are normally a sold...
In [17]:
# total number of rows with descriptions
df_desc1.shape[0]
Out[17]:
17063
In [18]:
#selecting the description colum
documents=df_desc1["description"].values
In [20]:
#d=documents[0:100]
In [136]:
def clean_documents(text):
    text = re.sub(r'\S*@\S*\s?', '', text, flags=re.MULTILINE) # remove email
    text = re.sub(r'http\S+', '', text, flags=re.MULTILINE) # remove web addresses
    text = re.sub("\'", "", text) # remove single quotes
    text = remove_stopwords(text)
    return text
Going to store cleanned documents in d
In [137]:
d=[]
for text in documents:
    d.append(clean_documents(text))
In [138]:
# Using all-MiniLM-L6-v2 Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')
In [139]:
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(d, batch_size = 8, show_progress_bar = True)

In [140]:
np.shape(text_embeddings)
Out[140]:
(17063, 384)
In [141]:
### A small example how to get an embedding vector from a description
In [142]:
first_description=df_desc1["description"].iloc[0]
first_description
first_description_embedding= model.encode(first_description, batch_size = 8, show_progress_bar = True)

Finding the similarity between documents

In [143]:
similarity_def=cosine_similarity(
    [first_description_embedding],
    text_embeddings)
In [144]:
similarities = cosine_similarity(text_embeddings)
print('pairwise dense output:\n {}\n'.format(similarities))
pairwise dense output:
 [[0.99999994 0.517641   0.28011447 ... 0.43325818 0.18076803 0.24013183]
 [0.517641   1.0000002  0.404719   ... 0.3619971  0.14919528 0.20264208]
 [0.28011447 0.404719   1.         ... 0.32132164 0.14099695 0.18206637]
 ...
 [0.43325818 0.3619971  0.32132164 ... 0.9999999  0.2486209  0.23398823]
 [0.18076803 0.14919528 0.14099695 ... 0.2486209  1.0000001  0.50518835]
 [0.24013183 0.20264208 0.18206637 ... 0.23398823 0.50518835 0.99999964]]

In [145]:
similarities_sorted = similarities.argsort()
similarities_sorted
Out[145]:
array([[ 1195,   908,  1324, ...,  5990,  5991,     0],
       [ 3663,  2005,  2006, ...,  5991,  5990,     1],
       [ 5170,  3645,  3634, ..., 15265, 15266,     2],
       ...,
       [ 7274,  3645,  3636, ..., 15413, 16421, 17060],
       [ 2275, 10919, 10918, ...,  7570, 14960, 17061],
       [ 2275,  1886,   433, ...,  4806, 17061, 17062]])
In [146]:
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(p)
17063
In [147]:
index_df
Out[147]:
id_1 id_2 score
0 0 5991 0.581011
1 1 5990 0.722513
2 2 15266 0.592017
3 3 9507 0.711753
4 4 15352 0.545395
... ... ... ...
17058 17058 17057 1.000000
17059 17059 7542 0.529714
17060 17060 16421 0.676777
17061 17061 14960 0.587318
17062 17062 17061 0.505188

17063 rows × 3 columns

Finding the first 10 similar definitions given the document 3

In [148]:
## Lets take the document 3
doc_index =3
documents[3]
Out[148]:
"Our long-running weekly beginner's showcase is regarded as the best open mic night in the UK. Catch up to ten new acts – some treading the boards for the very first time. This is where everyone starts and it's your chance to see the stars of tomorrow today.  Watch out for older hands dropping in to try out new material too."
In [149]:
results={}
for i in range(-2, -12, -1):
    similar_index=similarities_sorted[doc_index][i]
    rank=similarities[doc_index][similar_index]
    results[similar_index]=[rank]
In [150]:
results
Out[150]:
{9507: [0.71175265],
 16396: [0.65043193],
 5450: [0.64066726],
 1613: [0.5613535],
 286: [0.54780346],
 285: [0.54780346],
 9019: [0.5470681],
 9018: [0.5470681],
 14858: [0.5443444],
 14859: [0.5443444]}

Experiment 4: Description Topic Modelling - Deep Learning - BERTopic

Lets find the topic modelling of our descriptions We are going to use the text_embeddings calculated in the previous phase.

In [151]:
len(documents)
Out[151]:
17063

Atention - using the "cleanned" documets -- d

In [152]:
topic_model = BERTopic(min_topic_size=20).fit(d, text_embeddings)
In [169]:
topics, probs = topic_model.transform(d, text_embeddings)

Visualizing our topics

In [170]:
topic_model.visualize_topics()
In [171]:
#### Visualzing the first 5 keywords of our first 5 topics
In [172]:
topic_model.visualize_barchart()

Visualizing the similarity between topics

In [173]:
topic_model.visualize_heatmap()

Getting the frequency of each topic.

We should always ignore the first -1 topic.

In [174]:
#Lets see the frequency of the first 10 topics
topic_model.get_topic_freq()[0:10]
Out[174]:
Topic Count
0 -1 6833
1 0 2757
2 1 930
3 2 243
4 3 200
5 4 180
6 5 174
7 6 162
8 7 160
9 8 157
In [175]:
print("Number of topics found %s" %len(topic_model.get_topic_freq()))
Number of topics found 112

Visualizing the keywords of our topics.

In [176]:
#topic_model.get_topics()
In [177]:
document_6_topic=topics[6]
print("The topic of the document 6 is %s " %document_6_topic)
The topic of the document 6 is 0 
In [178]:
topic_model.get_topic(0)
Out[178]:
[('comedy', 0.013254166529224672),
 ('languageswearing', 0.010820931419055982),
 ('category', 0.010504200037712233),
 ('strong', 0.010445936431425324),
 ('age', 0.00948479386261396),
 ('standup', 0.008714802945022174),
 ('funny', 0.008683585671258508),
 ('comedian', 0.008579765979973342),
 ('16', 0.008092164336742468),
 ('fringe', 0.007915349117561442)]
In [179]:
df_desc1["description"].iloc[6]
Out[179]:
'End the week with the generally very chilled Sunday offering, with a dolly mixture of comedians each time.'

Experiment 5: Exploring the Schedules of Events

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the schedules column

In [50]:
df["schedules"]
Out[50]:
0        [{'start_ts': '2019-05-28T20:30:00+01:00', 'en...
1        [{'start_ts': '2019-05-06T20:30:00+01:00', 'en...
2        [{'start_ts': '2019-05-04T20:30:00+01:00', 'en...
3        [{'start_ts': '2019-05-19T20:30:00+01:00', 'en...
4        [{'start_ts': '2019-08-02T17:00:00+01:00', 'en...
                               ...                        
11025    [{'start_ts': '2019-10-26T19:30:00+01:00', 'en...
11026    [{'start_ts': '2019-10-31T22:30:00+00:00', 'en...
11027    [{'start_ts': '2019-10-15T10:00:00+01:00', 'en...
11028    [{'start_ts': '2019-10-24T18:00:00+01:00', 'en...
11029    [{'start_ts': '2019-10-25T12:00:00+01:00', 'en...
Name: schedules, Length: 11030, dtype: object
In [51]:
df_schedules=df
df_schedules.rename(columns={'tags':'event_tags'}, inplace=True)
df_schedules.rename(columns={'name':'event_name'}, inplace=True)
df_schedules.rename(columns={'links':'event_links'}, inplace=True)
df_schedules=df.explode('schedules')
#df_schedules
df_s=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
In [52]:
df_s.iloc[0]
Out[52]:
event_id                                                        232545
modified_ts                                       2020-03-15T12:18:05Z
created_ts                                        2011-07-14T15:03:56Z
event_name                                                 Bright Club
sort_name                                                  Bright Club
status                                                            live
id                                                              232545
descriptions         [{'type': 'description.list.default', 'descrip...
event_tags           [Comedy, Days out, Glasgow City of Science, Sc...
category                                                        Comedy
properties           {'dropin_event': False, 'booking_essential': F...
ranking_level                                                        2
ranking_in_level                                                     1
website                                                            NaN
phone_numbers                                                      NaN
alternative_names                                                  NaN
start_ts                                     2019-05-28T20:30:00+01:00
end_ts                                       2019-10-29T20:30:00+00:00
place_id                                                             1
performances         [{'ts': '2019-05-28T20:30:00+01:00', 'links': ...
performance_space                                                  NaN
phone_numbers                                                      NaN
Name: 0, dtype: object

Getting the Frequency of Starting Dates of Events Schedules

In [53]:
df_start=df_s.groupby([pd.to_datetime(df_s['start_ts'])]).size().reset_index()
df_start=df_start.rename(columns={0: "number_of_times"})
df_start=df_start.sort_values(by=['number_of_times'], ascending=False)
df_start.reset_index()
Out[53]:
index start_ts number_of_times
0 4 2019-05-01 10:00:00+01:00 44
1 1708 2019-07-31 19:00:00+01:00 23
2 2026 2019-08-02 19:00:00+01:00 23
3 1778 2019-08-01 12:00:00+01:00 22
4 1848 2019-08-01 18:00:00+01:00 22
... ... ... ...
4678 2097 2019-08-03 11:25:00+01:00 1
4679 2099 2019-08-03 11:40:00+01:00 1
4680 2107 2019-08-03 12:20:00+01:00 1
4681 2113 2019-08-03 12:55:00+01:00 1
4682 4682 2019-10-31 23:30:00+00:00 1

4683 rows × 3 columns

Visualizing the previous Start_Ts Schedules Events Freq.

In [55]:
fig = px.histogram(df_start, x='start_ts', y="number_of_times", title="Frequency of Starts Dates Schedules")
fig.show()

Getting the Frequency of End Dates of Events Schedules

In [56]:
df_end=df_s.groupby([pd.to_datetime(df_s['end_ts'])]).size().reset_index()
df_end=df_end.rename(columns={0: "number_of_times"})
df_end=df_end.sort_values(by=['number_of_times'], ascending=False)
df_end.reset_index()
fig = px.histogram(df_end, x='end_ts', y="number_of_times", title="Frequency of End Dates Schedules")
fig.show()

Experiment 6: Exploring the Performances Tickets of Events Schedules

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the performance column. We can not explode the performance column, if we hadnt have exploded the schedules column before. For that reason, we are using df_s dataframe, which has already exploded the schedules column.

In [57]:
df_s
Out[57]:
event_id modified_ts created_ts event_name sort_name status id descriptions event_tags category ... ranking_in_level website phone_numbers alternative_names start_ts end_ts place_id performances performance_space phone_numbers
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'type': 'description.list.default', 'descrip... [Comedy, Days out, Glasgow City of Science, Sc... Comedy ... 1 NaN NaN NaN 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1 [{'ts': '2019-05-28T20:30:00+01:00', 'links': ... NaN NaN
1 345866 2019-06-09T12:51:49Z 2013-03-01T10:33:08Z Red Raw Red Raw live 345866 [{'type': 'description.list.default', 'descrip... [Comedy, Red Raw, Stand-up] Comedy ... 2 NaN NaN NaN 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00 1 [{'ts': '2019-05-06T20:30:00+01:00', 'links': ... NaN NaN
2 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, The Saturday Show] Comedy ... 2 NaN NaN NaN 2019-05-04T20:30:00+01:00 2019-10-26T20:30:00+01:00 1 [{'ts': '2019-05-04T20:30:00+01:00', 'links': ... NaN NaN
3 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, Sunday Night Laugh-In] Comedy ... 2 NaN NaN NaN 2019-05-19T20:30:00+01:00 2019-10-27T20:30:00+00:00 1 [{'ts': '2019-06-30T20:30:00+01:00', 'links': ... NaN NaN
4 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up] Comedy ... 1 http://thepublandlord.com/ NaN NaN 2019-08-02T17:00:00+01:00 2019-08-11T17:00:00+01:00 30443 [{'ts': '2019-08-02T17:00:00+01:00', 'duration... Palais du Variete NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11025 1417800 2019-10-22T10:15:20Z 2019-09-30T12:26:15Z Love Letters Love Letters live 1417800 [{'type': 'description.list.default', 'descrip... [Play, Theatre] Theatre ... 2 NaN NaN NaN 2019-10-26T19:30:00+01:00 2019-10-26T19:30:00+01:00 122210 [{'ts': '2019-10-26T19:30:00+01:00', 'duration... NaN NaN
11026 1419477 2019-10-31T01:38:47Z 2019-10-03T01:39:29Z Experiment 4042 - Powered by Milk Events Experiment 4042 - Powered by Milk Events live 1419477 [{'type': 'description.official', 'description... [Clubs, Hip Hop, R&B] Clubs ... 2 NaN NaN NaN 2019-10-31T22:30:00+00:00 2019-10-31T22:30:00+00:00 122338 [{'ts': '2019-10-31T22:30:00+00:00', 'duration... NaN NaN
11027 1426790 2019-10-11T12:35:55Z 2019-10-10T17:10:20Z Walking Tour Walking Tour live 1426790 [{'type': 'description.list.default', 'descrip... [Days out, Nature, Walks] Days out ... 2 NaN NaN NaN 2019-10-15T10:00:00+01:00 2019-10-29T10:00:00+00:00 122712 [{'ts': '2019-10-15T10:00:00+01:00', 'duration... NaN NaN
11028 1427893 2019-10-17T12:34:08Z 2019-10-11T12:35:11Z Philly's Pub Quiz Philly's Pub Quiz live 1427893 [{'type': 'description.list.default', 'descrip... [Activities, Days out, Pub Quiz] Days out ... 2 NaN NaN NaN 2019-10-24T18:00:00+01:00 2019-10-31T18:00:00+00:00 122720 [{'ts': '2019-10-24T18:00:00+01:00', 'duration... NaN NaN
11029 1436517 2019-10-22T17:26:17Z 2019-10-22T15:44:23Z Scrapheap Golf Scrapheap Golf live 1436517 [{'type': 'description.list.default', 'descrip... [Activities, Days out, Golf, Sport] Days out ... 1 NaN NaN NaN 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919 [{'ts': '2019-10-25T12:00:00+01:00', 'duration... NaN NaN

12100 rows × 22 columns

In [58]:
a=df_s[["event_id", "event_name", "performances", "event_tags", "start_ts", "end_ts", "place_id"]]
df_p=a.explode("performances")
In [59]:
df_p
Out[59]:
event_id event_name performances event_tags start_ts end_ts place_id
0 232545 Bright Club {'ts': '2019-05-28T20:30:00+01:00', 'links': [... [Comedy, Days out, Glasgow City of Science, Sc... 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1
0 232545 Bright Club {'ts': '2019-07-23T20:30:00+01:00', 'duration'... [Comedy, Days out, Glasgow City of Science, Sc... 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1
0 232545 Bright Club {'ts': '2019-10-29T20:30:00+00:00', 'duration'... [Comedy, Days out, Glasgow City of Science, Sc... 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1
1 345866 Red Raw {'ts': '2019-05-06T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00 1
1 345866 Red Raw {'ts': '2019-05-13T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00 1
... ... ... ... ... ... ... ...
11029 1436517 Scrapheap Golf {'ts': '2019-10-25T12:00:00+01:00', 'duration'... [Activities, Days out, Golf, Sport] 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919
11029 1436517 Scrapheap Golf {'ts': '2019-10-26T12:00:00+01:00', 'duration'... [Activities, Days out, Golf, Sport] 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919
11029 1436517 Scrapheap Golf {'ts': '2019-10-27T12:00:00+00:00', 'duration'... [Activities, Days out, Golf, Sport] 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919
11029 1436517 Scrapheap Golf {'ts': '2019-10-28T12:00:00+00:00', 'duration'... [Activities, Days out, Golf, Sport] 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919
11029 1436517 Scrapheap Golf {'ts': '2019-10-31T12:00:00+00:00', 'duration'... [Activities, Days out, Golf, Sport] 2019-10-25T12:00:00+01:00 2019-10-31T12:00:00+00:00 122919

105326 rows × 7 columns

In [60]:
df_p=pd.concat([df_p.drop(['performances'], axis=1), df_p['performances'].apply(pd.Series)], axis=1)
In [61]:
df_p[0:2]
Out[61]:
event_id event_name event_tags start_ts end_ts place_id ts links tickets duration descriptions properties time_unknown
0 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1 2019-05-28T20:30:00+01:00 [{'type': 'booking', 'url': 'http://www.thesta... [{'type': 'Standard', 'currency': 'GBP', 'min_... NaN NaN NaN NaN
0 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 1 2019-07-23T20:30:00+01:00 [{'type': 'booking', 'url': 'https://www.thest... [{'type': 'Standard', 'currency': 'GBP', 'min_... 120.0 NaN NaN NaN

Exploring tickets

Now we have to explode the tickets column. We are going to remove the rows which tickets information is empty.

In [62]:
df_p=df_p.dropna(subset=['tickets'])

Since we dont need all the columns, we have selects a few of them.

In [63]:
df_t=df_p[["event_id", "event_name", "descriptions", "event_tags", "tickets", "place_id", "start_ts", "end_ts"]]
In [64]:
df_t[0:5]
Out[64]:
event_id event_name descriptions event_tags tickets place_id start_ts end_ts
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... [{'type': 'Standard', 'currency': 'GBP', 'desc... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00
In [65]:
df_t1=df_t.explode("tickets")

Now we are going to transform the max, and min prices of tickets to numeric values.

In [66]:
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [67]:
df_tickets[0:5]
Out[67]:
event_id event_name descriptions event_tags place_id start_ts end_ts 0 currency description max_price min_price type
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 NaN GBP NaN 0.0 5.0 Standard
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 NaN GBP NaN 0.0 5.0 Standard
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... 1 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00 NaN GBP £tbc 0.0 0.0 Standard
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] 1 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] 1 2019-05-06T20:30:00+01:00 2019-10-28T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard

Experiment 6.1: Getting the Frequency of Price Tickets

We are working just with max_price.

In [68]:
g_maxp=df_tickets.groupby(['max_price']).size().reset_index()
g_maxp=g_maxp.rename(columns={0: "number_of_times"})
#g_maxp=g_maxp.sort_values(by=['number_of_times'], ascending=False)
free_tickets=g_maxp[0:1]
## Removing FREE TICKETS
g_maxp=g_maxp.drop([0])
### 
g_maxp[:]
Out[68]:
max_price number_of_times
1 2.00 6
2 3.00 3
3 4.00 5
4 5.00 164
5 5.50 1
... ... ...
199 230.00 1
200 320.00 1
201 346.50 1
202 348.08 1
203 425.00 1

203 rows × 2 columns

In [69]:
fig = px.line(g_maxp, x="max_price", y="number_of_times", title='Frequency of price tickets')
fig.show()
In [70]:
print("The number of free tickets is: %s" %free_tickets["number_of_times"].values[0])
The number of free tickets is: 159295

Experiment 6.2: Getting the frequency of type (Standard, Children) tickets

In [133]:
tickets_type=df_tickets.groupby(['type']).size().reset_index()
tickets_type=tickets_type.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
tickets_type
Out[133]:
type number_of_times
37 Standard 101323
17 Concession 42265
16 Children 7542
24 Family 2839
39 Students 1321
... ... ...
48 Up to 2 accompanying adults 1
47 Under-5s 1
41 Table of up to five people 1
40 Students / Unemployed 1
118 without book 1

119 rows × 2 columns

In [134]:
px.histogram(tickets_type, x="type", y="number_of_times", histfunc="sum", color="type", title='Frequency of type tickets')

6.3 Exploring Performances Places

In [132]:
df_tickets["place_id"]
Out[132]:
0             1
0             1
0             1
1             1
1             1
          ...  
11029    122919
11029    122919
11029    122919
11029    122919
11029    122919
Name: place_id, Length: 161908, dtype: int64

Creating places dataframe

In [74]:
data="dataset/sample_20180501.json"
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["places"]))
    places=data["places"]
df_places = pd.DataFrame(places)
1224
In [75]:
df_place = df_tickets.merge(df_places, on=['place_id','place_id'])
In [76]:
df_place.shape[0]
Out[76]:
146147

6.3.1 Frequency of Performances per Town

In [77]:
df_town=df_place.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])
In [78]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town
Out[78]:
town number_of_times
35 Edinburgh 131340
88 St Andrews 2737
95 Wilkieston 1014
76 Peebles 585
65 Melrose 561
... ... ...
66 Methil 1
64 Markinch 1
92 Thornton 1
62 Longniddry 1
60 Loanhead 1

97 rows × 2 columns

In [79]:
px.scatter(town, x="town",y='number_of_times', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of Performances per Town")

6.3.2 Frequency of Type tickets per town

In [80]:
town_type=df_town.groupby(['town', 'type']).size().reset_index()
town_type=town_type.rename(columns={0: "number_of_times"})
town_type=town_type[town_type["town"]!=""]
In [81]:
town_type=town_type.sort_values(by=['number_of_times'], ascending=False)
town_type
Out[81]:
town type number_of_times
92 Edinburgh Standard 81618
79 Edinburgh Concession 38740
78 Edinburgh Children 3137
84 Edinburgh Family 2572
275 St Andrews Standard 1111
... ... ... ...
118 Edinburgh one accompanying adult 1
119 Edinburgh performers 1
229 North Berwick Family 1
228 North Berwick Concession 1
305 Yetholm Standard 1

305 rows × 3 columns

In [82]:
fig = px.scatter(town_type, x='town', y='type', color='number_of_times', title="Frequency of type tickets per town")
fig.show()
In [83]:
px.scatter(town_type, x="town",y='type', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of performances type tickets per town")

6.3.3. Frequency of Max_Price tickets per towns

In [84]:
a=df_town[["town", "max_price"]]
a=a[a["town"]!=""]
town_price=a.groupby(['town', 'max_price']).size().reset_index()
town_price=town_price.rename(columns={0: "number_of_times"})
town_price=town_price.sort_values(by=['number_of_times'], ascending=False)
town_price
Out[84]:
town max_price number_of_times
55 Edinburgh 0.00 129721
342 St Andrews 0.00 2725
358 Wilkieston 0.00 891
319 Peebles 0.00 581
289 Melrose 0.00 559
... ... ... ...
63 Edinburgh 6.60 1
220 Eyemouth 10.00 1
221 Eyemouth 12.00 1
62 Edinburgh 6.04 1
60 Edinburgh 5.50 1

362 rows × 3 columns

6.3.3.1. Frequency of free tickets per town

In [85]:
free_town_price=town_price[town_price["max_price"]== 0.0]
free_town_price
Out[85]:
town max_price number_of_times
55 Edinburgh 0.0 129721
342 St Andrews 0.0 2725
358 Wilkieston 0.0 891
319 Peebles 0.0 581
289 Melrose 0.0 559
... ... ... ...
307 Newtongrange 0.0 1
304 Newport on Tay 0.0 1
24 Craigrothie 0.0 1
292 Methil 0.0 1
49 Duns 0.0 1

94 rows × 3 columns

In [86]:
fig = px.bar(free_town_price, x='town', y='number_of_times', color='number_of_times', barmode='group', title="Frequency of Free Tickets per Town")
fig.show()

6.3.3.1. Frequency of No free tickets per town

In [87]:
town_price=town_price[town_price["max_price"]!= 0.0]
town_price
Out[87]:
town max_price number_of_times
86 Edinburgh 11.00 196
67 Edinburgh 7.99 184
83 Edinburgh 10.00 136
359 Wilkieston 5.00 123
92 Edinburgh 12.00 121
... ... ... ...
63 Edinburgh 6.60 1
220 Eyemouth 10.00 1
221 Eyemouth 12.00 1
62 Edinburgh 6.04 1
60 Edinburgh 5.50 1

268 rows × 3 columns

In [88]:
fig = px.bar(town_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Town")
fig.show()
In [89]:
town_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[89]:
max_price number_of_times
town
Edinburgh 6769.78 1619
North Berwick 1435.58 9
Dunfermline 217.00 4
Musselburgh 186.50 17
Gorebridge 163.58 6
Penicuik 150.00 2
Peebles 140.00 4
Haddington 131.65 16
St Andrews 130.20 12
Linlithgow 127.25 5
Dunbar 117.25 6
Galashiels 97.98 18
Kirkcaldy 71.49 6
Lochgelly 62.50 2
Innerleithen 59.21 5
Hawick 57.00 2
Melrose 56.50 2
Bathgate 48.09 5
Prestonpans 43.50 3
Crail 43.00 3
St Monans 42.00 6
Ceres 40.00 1
Glenrothes 38.00 2
Dalkeith 35.48 15
Eyemouth 35.00 8
Newport on Tay 28.67 2
Livingston 28.50 87
Cupar 23.50 8
South Queensferry 23.00 1
Peeblesshire 21.79 1
East Linton 21.00 2
Dirleton 21.00 2
Pathhead 21.00 2
Kelso 18.43 1
Anstruther 17.00 3
Selkirk 8.00 8
Markinch 7.00 1
Armadale 5.00 3
Wilkieston 5.00 123

6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews

6.4.1 Frequency of Price Tickets per Scottish City

In [90]:
scot_towns_price=town_price[town_price['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [91]:
scot_towns_price[0:10]
Out[91]:
town max_price number_of_times
86 Edinburgh 11.00 196
67 Edinburgh 7.99 184
83 Edinburgh 10.00 136
92 Edinburgh 12.00 121
213 Edinburgh 165.00 51
68 Edinburgh 8.00 46
152 Edinburgh 33.00 41
59 Edinburgh 5.00 35
89 Edinburgh 11.29 33
93 Edinburgh 12.50 32
In [92]:
fig = px.bar(scot_towns_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Scottish City")
fig.show()
In [93]:
scot_towns_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[93]:
max_price number_of_times
town
Edinburgh 6769.78 1619
St Andrews 130.20 12

6.4.2 Frequency of Type Tickets per Scottish City

In [94]:
scot_towns_type=town_type[town_type['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [95]:
scot_towns_type[0:10]
Out[95]:
town type number_of_times
92 Edinburgh Standard 81618
79 Edinburgh Concession 38740
78 Edinburgh Children 3137
84 Edinburgh Family 2572
275 St Andrews Standard 1111
272 St Andrews Children 781
276 St Andrews Students 738
131 Edinburgh under 5s 550
91 Edinburgh Seniors 378
94 Edinburgh Students 337
In [96]:
fig = px.bar(scot_towns_type, x='town', y='number_of_times', color='type', barmode='group', title="Frequency of Type Tickets per Scottish City")
fig.show()
In [97]:
scot_towns_type.groupby(["town"]).sum()
Out[97]:
number_of_times
town
Edinburgh 127979
St Andrews 2708
In [98]:
df_place.loc[0]
Out[98]:
event_id                                                     232545
event_name                                              Bright Club
descriptions_x                                                  NaN
event_tags        [Comedy, Days out, Glasgow City of Science, Sc...
place_id                                                          1
start_ts                                  2019-05-28T20:30:00+01:00
end_ts                                    2019-10-29T20:30:00+00:00
0                                                               NaN
currency                                                        GBP
description                                                     NaN
max_price                                                       0.0
min_price                                                       5.0
type                                                       Standard
address                                                5 York Place
email                                          admin@thestand.co.uk
postal_code                                                 EH1 3EB
properties        {'place.child-restrictions': True, 'place.faci...
sort_name                                                     Stand
town                                                      Edinburgh
website                                   http://www.thestand.co.uk
modified_ts                                    2021-11-24T12:18:33Z
created_ts                                     2021-11-24T12:18:33Z
name                                                      The Stand
loc               {'latitude': '55.955806109395006', 'longitude'...
country_code                                                     GB
tags                  [Bar & pub food, Comedy, Restaurants, Venues]
descriptions_y    [{'type': 'description.list.default', 'descrip...
phone_numbers     {'info': '0131 558 7272', 'box_office': '0131 ...
status                                                         live
Name: 0, dtype: object

6.4.3.3 Frequency of Schedules Dates per Event and per Scottish City

In [99]:
df_place2=df_place.dropna(subset=['town'])
df_place2
df_scott=df_place2[df_place2['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
df_scott=df_scott[["event_id", "event_name", "event_tags", "town", "start_ts", "end_ts"]]
df_scott[0:3]
Out[99]:
event_id event_name event_tags town start_ts end_ts
0 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
1 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
2 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00

Note: An event can have several schedules. And a schedule has an starting and end date. Therefore, an event can have several starting and end dates.

In [100]:
fig = px.scatter(df_scott, x='start_ts', y="event_name", title="Frequency of starting date per event in Scottish cities")
fig.show()
In [101]:
fig = px.scatter(df_scott, x='end_ts', y="event_name", title="Frequency of ending date per event in Scottish cities")
fig.show()

6.4.4 Grouping Schedules per Event and Scottish City

In [102]:
scott_schedule=df_scott.groupby(['event_name', 'town']).size().reset_index()
scott_schedule=scott_schedule.rename(columns={0: "number_of_times"})
scott_schedule=scott_schedule.sort_values(by=['number_of_times'], ascending=False)
scott_schedule
Out[102]:
event_name town number_of_times
6673 St Andrews Ghost Tours St Andrews 2208
4816 Mercat Tours: Evening of Ghost and Ghouls Edinburgh 2184
4817 Mercat Tours: Ghostly Underground Edinburgh 2184
4819 Mercat Tours: Historic Underground Edinburgh 2184
4820 Mercat Tours: Secrets of Edinburgh's Royal Mile Edinburgh 1456
... ... ... ...
5334 Opportunities for New Musical Theatre After th... Edinburgh 1
5336 Orchestre de Paris 1 Edinburgh 1
5337 Orchestre de Paris 2 Edinburgh 1
5339 Order of the Toad, Basic Hinge + Buffet Lunch Edinburgh 1
8603 ‘Tae be Yersels’: The Betty Boyd Memorial Scot... Edinburgh 1

8604 rows × 3 columns

In [103]:
t=scott_schedule.groupby(["event_name"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[103]:
number_of_times
event_name
St Andrews Ghost Tours 2208
Mercat Tours: Ghostly Underground 2184
Mercat Tours: Evening of Ghost and Ghouls 2184
Mercat Tours: Historic Underground 2184
Mercat Tours: Secrets of Edinburgh's Royal Mile 1456
... ...
Ghøstwriter 1
Shrek 1
Ghost Fleet 1
Getting to Know Asia 1
‘Tae be Yersels’: The Betty Boyd Memorial Scots Language Lecture 1

8575 rows × 1 columns

In [104]:
fig = px.bar(t, title="Frequency of Schedules per event")
fig.show()

6.4.5 Exploring Tags per Schedule and Scottish Cities.

In [105]:
a=df_scott.reset_index(drop=True)
tags_town=a[["event_tags", "town"]]
tags_town=tags_town.explode("event_tags")
tags_town
Out[105]:
event_tags town
0 Comedy Edinburgh
0 Days out Edinburgh
0 Glasgow City of Science Edinburgh
0 Science Edinburgh
0 Stand-up Edinburgh
... ... ...
134076 Reggae Edinburgh
134076 Rock & Pop Edinburgh
134076 Soul Edinburgh
134076 Swing Edinburgh
134076 World Edinburgh

339148 rows × 2 columns

In [106]:
scott_tag=tags_town.groupby(['town', 'event_tags']).size().reset_index()
scott_tag=scott_tag.rename(columns={0: "number_of_times"})
scott_tag=scott_tag.sort_values(by=['number_of_times'], ascending=False)
scott_tag
Out[106]:
town event_tags number_of_times
109 Edinburgh Comedy 53211
576 Edinburgh Theatre 41065
145 Edinburgh Days out 20056
527 Edinburgh Stand-up 15377
309 Edinburgh Kids 13802
... ... ... ...
424 Edinburgh Property Show 1
415 Edinburgh Poety 1
406 Edinburgh Picnic 1
402 Edinburgh Peter Hook 1
752 St Andrews Wine tasting 1

753 rows × 3 columns

In [130]:
fig=px.histogram(scott_tag, x="town", y="number_of_times", histfunc="sum", color="event_tags", title='Frequency of tags in Scottish Cities')
fig.update_layout(legend_traceorder="reversed")
fig.show()
In [108]:
t=scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[108]:
number_of_times
event_tags
Comedy 53235
Theatre 41165
Days out 22285
Stand-up 15377
History 14280
... ...
New Age 1
New Order 1
Noir 1
Orchestra 1
Manic Street Preachers 1

673 rows × 1 columns

6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh

In [109]:
edi_scott_tag=scott_tag[scott_tag['town'].isin(["Edinburgh"])]
edi_scott_tag
Out[109]:
town event_tags number_of_times
109 Edinburgh Comedy 53211
576 Edinburgh Theatre 41065
145 Edinburgh Days out 20056
527 Edinburgh Stand-up 15377
309 Edinburgh Kids 13802
... ... ... ...
425 Edinburgh Property show 1
424 Edinburgh Property Show 1
415 Edinburgh Poety 1
406 Edinburgh Picnic 1
402 Edinburgh Peter Hook 1

654 rows × 3 columns

In [110]:
edi_scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
Out[110]:
number_of_times
event_tags
Comedy 53211
Theatre 41065
Days out 20056
Stand-up 15377
Kids 13802
... ...
Highland Games & Gatherings 1
Hearts 1
Supernatural 1
Heart of Midlothian 1
5K 1

654 rows × 1 columns

In [111]:
fig = px.bar(edi_scott_tag, x='town', y='number_of_times', color='event_tags', barmode='group', title="Frequency of schedules tags for Edinburgh")
fig.show()

6.4.6 Histograms of starting/end schedules dates for Edinburgh

In [112]:
scott_start=df_scott.groupby([pd.to_datetime(df_scott['start_ts']), "town"]).size().reset_index()
scott_start=scott_start.rename(columns={0: "number_of_times"})
scott_start=scott_start.sort_values(by=['number_of_times'], ascending=False)
scott_start.reset_index()
Out[112]:
index start_ts town number_of_times
0 3 2019-05-01 10:00:00+01:00 Edinburgh 4000
1 14 2019-05-01 16:00:00+01:00 St Andrews 2208
2 46 2019-05-03 19:00:00+01:00 Edinburgh 2196
3 40 2019-05-03 12:00:00+01:00 Edinburgh 2184
4 41 2019-05-03 13:00:00+01:00 Edinburgh 2184
... ... ... ... ...
4184 634 2019-06-15 14:00:00+01:00 Edinburgh 1
4185 636 2019-06-15 18:30:00+01:00 Edinburgh 1
4186 639 2019-06-15 19:45:00+01:00 Edinburgh 1
4187 640 2019-06-15 20:00:00+01:00 Edinburgh 1
4188 3318 2019-08-27 18:00:00+01:00 Edinburgh 1

4189 rows × 4 columns

In [113]:
ed_scott_start=scott_start[scott_start['town'].isin(["Edinburgh"])].reset_index()
ed_scott_start.groupby(["start_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_start, x='town', y='number_of_times', color='start_ts', barmode='group', title="Frequency of starting date schedules for Edinburgh")
#fig.show()
Out[113]:
index number_of_times
start_ts
2019-05-01 10:00:00+01:00 3 4000
2019-05-03 19:00:00+01:00 46 2196
2019-05-03 13:00:00+01:00 41 2184
2019-05-03 12:00:00+01:00 40 2184
2019-05-03 10:00:00+01:00 37 1548
... ... ...
2019-09-08 17:00:00+01:00 3428 1
2019-07-04 11:00:00+01:00 999 1
2019-07-04 14:00:00+01:00 1001 1
2019-07-04 14:30:00+01:00 1002 1
2019-08-05 12:35:00+01:00 2035 1

4016 rows × 2 columns

In [114]:
scott_end=df_scott.groupby([pd.to_datetime(df_scott['end_ts']), "town"]).size().reset_index()
scott_end=scott_end.rename(columns={0: "number_of_times"})
scott_end=scott_end.sort_values(by=['number_of_times'], ascending=False)
scott_end.reset_index()
ed_scott_end=scott_end[scott_end['town'].isin(["Edinburgh"])].reset_index()
ed_scott_end.groupby(["end_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_end, x='town', y='number_of_times', color='end_ts', barmode='group', title="Frequency of ending date schedules for Edinburgh")
#fig.show()
Out[114]:
index number_of_times
end_ts
2019-10-31 10:00:00+00:00 3928 4389
2019-10-31 21:00:00+00:00 3957 2205
2019-10-31 16:00:00+00:00 3939 2184
2019-10-31 17:00:00+00:00 3941 2184
2019-10-31 00:00:00+00:00 3925 1936
... ... ...
2019-09-10 13:15:00+01:00 3178 1
2019-09-10 17:45:00+01:00 3179 1
2019-09-10 18:00:00+01:00 3180 1
2019-09-10 19:00:00+01:00 3181 1
2019-08-18 17:40:00+01:00 2257 1

3793 rows × 2 columns

In [115]:
fig = px.histogram(ed_scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Edinburgh")
fig.show()
In [116]:
fig = px.histogram(scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Scottish Cities")
fig.show()
In [117]:
fig = px.histogram(scott_end, x='end_ts', y="number_of_times", title="Histogram of Schedules Ending Dates for Scottish Cities")
fig.show()
In [118]:
fig = px.histogram(scott_end, x="end_ts", y="number_of_times", histfunc="sum", title="Histogram on Date Axes")
fig.update_traces(xbins_size="M1")
fig.update_xaxes(showgrid=True, ticklabelmode="period", dtick="M1", tickformat="%b\n%Y")
fig.update_layout(bargap=0.1)
fig.add_trace(go.Scatter(mode="markers", x=scott_end["end_ts"], y=scott_end["number_of_times"], name="daily"))
fig.show()

6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time

In [119]:
b=df_scott.reset_index(drop=True)
tag_town_time=b[["event_tags", "town", "start_ts", "end_ts"]]
tag_town_time=tag_town_time.explode("event_tags")
tag_town_time
Out[119]:
event_tags town start_ts end_ts
0 Comedy Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 Days out Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 Glasgow City of Science Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 Science Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
0 Stand-up Edinburgh 2019-05-28T20:30:00+01:00 2019-10-29T20:30:00+00:00
... ... ... ... ...
134076 Reggae Edinburgh 2019-05-07T21:00:00+01:00 2019-07-30T21:00:00+01:00
134076 Rock & Pop Edinburgh 2019-05-07T21:00:00+01:00 2019-07-30T21:00:00+01:00
134076 Soul Edinburgh 2019-05-07T21:00:00+01:00 2019-07-30T21:00:00+01:00
134076 Swing Edinburgh 2019-05-07T21:00:00+01:00 2019-07-30T21:00:00+01:00
134076 World Edinburgh 2019-05-07T21:00:00+01:00 2019-07-30T21:00:00+01:00

339148 rows × 4 columns

In [120]:
scott_tag_end=tag_town_time.groupby([pd.to_datetime(tag_town_time['end_ts']), "event_tags"]).size().reset_index()
scott_tag_end=scott_tag_end.rename(columns={0: "number_of_times"})
scott_tag_end=scott_tag_end.sort_values(by=['number_of_times'], ascending=False)


scott_tag_start=tag_town_time.groupby([pd.to_datetime(tag_town_time['start_ts']), "event_tags"]).size().reset_index()
scott_tag_start=scott_tag_start.rename(columns={0: "number_of_times"})
scott_tag_start=scott_tag_start.sort_values(by=['number_of_times'], ascending=False)
In [121]:
scott_tag_start
Out[121]:
start_ts event_tags number_of_times
49 2019-05-01 10:00:00+01:00 Visual art 3247
31 2019-05-01 10:00:00+01:00 Exhibitions 3079
107 2019-05-01 16:00:00+01:00 Activities 2208
111 2019-05-01 16:00:00+01:00 Walks 2208
110 2019-05-01 16:00:00+01:00 Traditional & Heritage 2208
... ... ... ...
3519 2019-06-22 16:00:00+01:00 Visual art 1
12810 2019-08-24 12:15:00+01:00 Kids 1
3518 2019-06-22 16:00:00+01:00 Painting & Drawing 1
1785 2019-05-25 20:00:00+01:00 Techno 1
13773 2019-09-13 23:00:00+01:00 Clubs 1

16719 rows × 3 columns

6.4.7.1 Frequency of schedules Starting Date in Scottish City

In [122]:
#fig = px.bar(scott_tag_start, x='event_tags', y='start_ts', color='number_of_times', barmode='group', title="Frequency of schedules tags per Scottish City")
#fig.show()

fig = px.scatter(scott_tag_start, x='start_ts', y='number_of_times', title="Frequency of schedules Starting Date in Scottish City.")
fig.show()

6.4.7.2 Frequency of schedules Ending Date in Scottish City

In [123]:
fig = px.scatter(scott_tag_end, x='end_ts', y='number_of_times', title="Frequency of schedules Ending Date in Scottish City.")
fig.show()

6.4.7.3 Scheduled tags and Starting Dates in Scottish City

In [124]:
fig = px.scatter(scott_tag_start, x='start_ts', y='event_tags', title="Scheduled Tags and Starting Dates in Scottish City.")
fig.show()

6.4.7.3 Scheduled Tags and Ending Dates in Scottish City

In [125]:
fig = px.scatter(scott_tag_end, x='end_ts', y='event_tags', title="Scheduled Tags and Ending Dates in Scottish City.")
fig.show()